18 research outputs found
Independent Prototype Propagation for Zero-Shot Compositionality
Humans are good at compositional zero-shot reasoning; someone who has never
seen a zebra before could nevertheless recognize one when we tell them it looks
like a horse with black and white stripes. Machine learning systems, on the
other hand, usually leverage spurious correlations in the training data, and
while such correlations can help recognize objects in context, they hurt
generalization. To be able to deal with underspecified datasets while still
leveraging contextual clues during classification, we propose ProtoProp, a
novel prototype propagation graph method. First we learn prototypical
representations of objects (e.g., zebra) that are conditionally independent
w.r.t. their attribute labels (e.g., stripes) and vice versa. Next we propagate
the independent prototypes through a compositional graph, to learn
compositional prototypes of novel attribute-object combinations that reflect
the dependencies of the target distribution. The method does not rely on any
external data, such as class hierarchy graphs or pretrained word embeddings. We
evaluate our approach on AO-Clever, a synthetic and strongly visual dataset
with clean labels, and UT-Zappos, a noisy real-world dataset of fine-grained
shoe types. We show that in the generalized compositional zero-shot setting we
outperform state-of-the-art results, and through ablations we show the
importance of each part of the method and their contribution to the final
results
Recurrently Predicting Hypergraphs
This work considers predicting the relational structure of a hypergraph for a
given set of vertices, as common for applications in particle physics,
biological systems and other complex combinatorial problems. A problem arises
from the number of possible multi-way relationships, or hyperedges, scaling in
for a set of elements. Simply storing an indicator
tensor for all relationships is already intractable for moderately sized ,
prompting previous approaches to restrict the number of vertices a hyperedge
connects. Instead, we propose a recurrent hypergraph neural network that
predicts the incidence matrix by iteratively refining an initial guess of the
solution. We leverage the property that most hypergraphs of interest are
sparsely connected and reduce the memory requirement to ,
where is the maximum number of positive edges, i.e., edges that actually
exist. In order to counteract the linearly growing memory cost from training a
lengthening sequence of refinement steps, we further propose an algorithm that
applies backpropagation through time on randomly sampled subsequences. We
empirically show that our method can match an increase in the intrinsic
complexity without a performance decrease and demonstrate superior performance
compared to state-of-the-art models
Self-Guided Diffusion Models
Diffusion models have demonstrated remarkable progress in image generation
quality, especially when guidance is used to control the generative process.
However, guidance requires a large amount of image-annotation pairs for
training and is thus dependent on their availability, correctness and
unbiasedness. In this paper, we eliminate the need for such annotation by
instead leveraging the flexibility of self-supervision signals to design a
framework for self-guided diffusion models. By leveraging a feature extraction
function and a self-annotation function, our method provides guidance signals
at various image granularities: from the level of holistic images to object
boxes and even segmentation masks. Our experiments on single-label and
multi-label image datasets demonstrate that self-labeled guidance always
outperforms diffusion models without guidance and may even surpass guidance
based on ground-truth labels, especially on unbalanced data. When equipped with
self-supervised box or mask proposals, our method further generates visually
diverse yet semantically consistent images, without the need for any class,
box, or segment label annotation. Self-guided diffusion is simple, flexible and
expected to profit from deployment at scale
Incremental concept learning with few training examples and hierarchical classification
Object recognition and localization are important to automatically interpret video and allow better querying
on its content. We propose a method for object localization that learns incrementally and addresses four key
aspects. Firstly, we show that for certain applications, recognition is feasible with only a few training samples.
Secondly, we show that novel objects can be added incrementally without retraining existing objects, which is
important for fast interaction. Thirdly, we show that an unbalanced number of positive training samples leads
to biased classi er scores that can be corrected by modifying weights. Fourthly, we show that the detector
performance can deteriorate due to hard-negative mining for similar or closely related classes (e.g., for Barbie
and dress, because the doll is wearing a dress). This can be solved by our hierarchical classi cation. We introduce
a new dataset, which we call TOSO, and use it to demonstrate the e ectiveness of the proposed method for the
localization and recognition of multiple objects in images.This research was performed in the GOOSE project, which is jointly funded by the enabling technology program
Adaptive Multi Sensor Networks (AMSN) and the MIST research program of the Dutch Ministry of Defense.
This publication was supported by the research program Making Sense of Big Data (MSoBD).peer-reviewe
Data Augmentations in Deep Weight Spaces
Learning in weight spaces, where neural networks process the weights of other
deep neural networks, has emerged as a promising research direction with
applications in various fields, from analyzing and editing neural fields and
implicit neural representations, to network pruning and quantization. Recent
works designed architectures for effective learning in that space, which takes
into account its unique, permutation-equivariant, structure. Unfortunately, so
far these architectures suffer from severe overfitting and were shown to
benefit from large datasets. This poses a significant challenge because
generating data for this learning setup is laborious and time-consuming since
each data sample is a full set of network weights that has to be trained. In
this paper, we address this difficulty by investigating data augmentations for
weight spaces, a set of techniques that enable generating new data examples on
the fly without having to train additional input weight space elements. We
first review several recently proposed data augmentation schemes %that were
proposed recently and divide them into categories. We then introduce a novel
augmentation scheme based on the Mixup method. We evaluate the performance of
these techniques on existing benchmarks as well as new benchmarks we generate,
which can be valuable for future studies.Comment: Accepted to NeurIPS 2023 Workshop on Symmetry and Geometry in Neural
Representation
Recognition and localization of relevant human behavior in videos, SPIE,
ABSTRACT Ground surveillance is normally performed by human assets, since it requires visual intelligence. However, especially for military operations, this can be dangerous and is very resource intensive. Therefore, unmanned autonomous visualintelligence systems are desired. In this paper, we present an improved system that can recognize actions of a human and interactions between multiple humans. Central to the new system is our agent-based architecture. The system is trained on thousands of videos and evaluated on realistic persistent surveillance data in the DARPA Mind's Eye program, with hours of videos of challenging scenes. The results show that our system is able to track the people, detect and localize events, and discriminate between different behaviors, and it performs 3.4 times better than our previous system